AITopics | learning transferable visual model

K-LITE: Learning Transferable Visual Models with External Knowledge

Neural Information Processing SystemsDec-24-2025, 08:37:38 GMT

The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, based on the broad concept coverage achieved through large-scale data collection process. Alternatively, we argue that learning with external knowledge about images is a promising way which leverages a much more structured source of supervision and offers sample efficiency. In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods. Our code is released at https://github.com/microsoft/klite.

k-lite, learning transferable visual model, name change, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Supplementary Material for "K-L ITE: Learning Transferable Visual Models with External Knowledge "

Neural Information Processing SystemsAug-15-2025, 08:32:42 GMT

This appendix is organized as follows. In Section A (referred by CheckList), we discuss the societal impact. In Section B.1 (referred by Section 4.1), we summarize the statistics of the datasets used in In Section B.2 (referred by Section 4), we introduce the pre-training and model adaptation In Section B.3, we provide zero-shot retrieval comparison by introducing knowledge. In Section B.4, we provide quantitative analysis on how external knowledge benefit transfer. In Section B.5 (referred by Section 4.2 and 4.3), we provide more visualizations of success In Section B.6, we provide more object detection results by training on larger dataset for We do not anticipate a specific negative impact, but, as with any Machine Learning method, we recommend to exercise caution.

dataset, knowledge, prediction, (12 more...)

Neural Information Processing Systems

Country:

Oceania (0.04)
Europe > Ukraine (0.04)
Europe > Russia (0.04)
(4 more...)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)

Add feedback

K-LITE: Learning Transferable Visual Models with External Knowledge

Neural Information Processing SystemsOct-11-2024, 08:43:40 GMT

The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, based on the broad concept coverage achieved through large-scale data collection process. Alternatively, we argue that learning with external knowledge about images is a promising way which leverages a much more structured source of supervision and offers sample efficiency. In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively.

external knowledge, k-lite, learning transferable visual model, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Charge (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

OpenCLIP for Image Search and Automatic Captioning

#artificialintelligenceMar-9-2023, 12:05:34 GMT

I have been using and writing about OpenAI's CLIP system since it came out in 2021 [1]. It consists of image and text encoding models that can be used for various forms of cross-modal comparison, like using a text query to find the best matching image in a library quickly. In December 2022, an independent group of researchers known as LAION released a paper called "Reproducible scaling laws for contrastive language-image learning" [2] that describes how they first reimplemented and trained a model similar to CLIP and then experimented with improving the system by training with a larger dataset and using new ML techniques. They call their new model OpenCLIP. In this article, I will provide some background info on the original CLIP, describe how LAION improved the model, and show some results from my experiments with the two systems using images from the Library of Congress's Flickr photostream.

caption, dataset, openclip, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)

Add feedback

Filters

Collaborating Authors

learning transferable visual model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

K-LITE: Learning Transferable Visual Models with External Knowledge

Supplementary Material for "K-L ITE: Learning Transferable Visual Models with External Knowledge "

K-LITE: Learning Transferable Visual Models with External Knowledge

OpenCLIP for Image Search and Automatic Captioning